Commonly used human motion capture systems require intrusive attachment ofmarkers that are visually tracked with multiple cameras. In this work wepresent an efficient and inexpensive solution to markerless motion captureusing only a few Kinect sensors. Unlike the previous work on 3d pose estimationusing a single depth camera, we relax constraints on the camera location and donot assume a co-operative user. We apply recent image segmentation techniquesto depth images and use curriculum learning to train our system on purelysynthetic data. Our method accurately localizes body parts without requiring anexplicit shape model. The body joint locations are then recovered by combiningevidence from multiple views in real-time. We also introduce a dataset of ~6million synthetic depth frames for pose estimation from multiple cameras andexceed state-of-the-art results on the Berkeley MHAD dataset.
展开▼